Volume 15 - Issue 6

Mini Review Biomedical Science and Research Biomedical Science and Research CC by Creative Commons, CC-BY

Methodology of Non-probability Sampling in Survey Research

*Corresponding author: Kyu-Seong Kim, Professor of Department of Statistics, University of Seoul, South Korea.

Received: March 14, 2022; Published: March 21, 2022

DOI: 10.34297/AJBSR.2022.15.002166

Introduction

Since the mid20th century the probability sampling paradigm has become a mainstream methodology for sampling and inference in most surveys [1]. Especially large-scale national surveys conducted in national statistical offices or institutions are mostly based on this paradigm because objective statistics in the basis of this paradigm would be given to these institutions. Usually, probability sampling is subject to well-constructed frame, sampling design and high rate of response.

Recently, the probability sampling paradigm is faced with a great challenge due to decreasing population coverage rate and increasing non-response rate coupled with rising costs of sample surveys. Also, the number of sample surveys using non-probability samples like web survey is growing. In these situations, concerns about non-probability sampling paradigm as an alternative to probability sampling paradigm has been increasing [1,2].

Sample surveys with non-probability samples as well as probability samples has been carried out consistently. Nonprobability samples have the merit of the faster speed of data collection, lower survey cost, and easier accessibility to the potential respondents. But lack of control of selection bias as well as the difficulty of statistical inference are the weakness of these samples. So, the overall use of non-probability samples is controversial in survey research area. Some of current dominant view of sampling are as follows, “researchers should avoid nonprobability online panels when one of the research objectives is to accurately estimate population values [3].” or “statistical inference is impossible without probability sampling or that the sampling method is irrelevant to inference [1].”

Nevertheless, non-probability samples have been commonly used in area of case-control study, clinical trial, observational study and so on. It is because of the research situation under which convenience or inevitability of non-probability samples is required. And with natural results, if the number of non-probability sample surveys is increasing, there will be a growing need for development of methodology based on non-probability samples.

Traditionally in the field of survey research, development of theory followed rather than driving realistic demands. As a typical example, surveys with sample have replaced the complete enumeration in the early 20th century. It is not because of the theoretical excellence of sample surveys, but because of the rapidly rising demands on much faster results through sample surveys. Then the theory of sample surveys has been established over time.

The theoretical development of non-probability sampling in survey research would go through a similar process as in sample surveys. If the number of surveys with non-probability samples is increasing, a corresponding theoretical development is expected. Such an expectation is hopeful because sampling theory is only a strategy not a dogma [4,5]. That is, the sampling theory is not an absolute principle, but a great strategy for obtaining an objective result in survey research. So, if we fully understand the principle of sampling as a strategy, then we can seek an appropriate methodology with non-probability samples in survey research.

Non-probability sampling in Survey Research

In survey research, randomization means the process of random allocation of units in experiments or random selection of sampling units in sample surveys. This randomization contributes two things to survey research. First, the objectivity of survey results may be guaranteed through randomization because researcher’s subjective selection bias can be removed by randomization. This is a great contribution to survey research as well as science [6].

The next contribution is that the sampling distribution generated by randomization may provide a basis of statistical inference to survey research [6,7]. Such inference is called randomization-based inference or design-based inference. In the strict sense, randomization distribution is different from the distribution of uncertainty of things, so there is an argument that the inference based on randomization distribution is not valid even though randomization distribution itself is valid in the sense of sampling distribution [8].

Non-probability sampling is defined as a sampling, not a probability sampling [9]. It occurs if either the sample is not selected randomly or the inclusion probability of unit is unknown even under random sampling [9,10]. For example, quota sampling, judgment sampling, and volunteer sampling are considered as nonprobability sampling.

By this definition, non-probability sampling is not free from selection bias by researcher and does not provide randomization distribution where theoretical inference takes place. Therefore, these two things should be considered in developing theories of non-probability sampling.

Methodology of Non-probability Sampling

Little is known about non-probability sampling methodology for controlling selection bias. Instead, if we recognize the existence of selection bias in non-probability sampling, we may think of two response strategies against that.

The first strategy is about sampling mechanisms that do not affect statistical inference. In such a mechanism, the non-probability sample does not cause selection bias [11,12]. In volunteer sampling, for example, if some characteristics of sample members are similar as those of non-sample members, then the problem of selection bias does not arise.

The second is to adjust the selection bias in the process of statistical inference after selecting a non-probability sample. This strategy may be classified into a pseudo-design -based framework as well as model-based framework [5]. Combinations of both frameworks are also possible afterward.

In the pseudo-design-based framework, non-probability samples are regarded as probability samples. But the designweights are not available because the sampling process of the nonprobability sample is unknown. In this framework, such unknown or undefined design weights are replaced by the corresponding surrogate weights called pseudo-design weights. Here pseudoweights are usually constructed by using propensity weighting [13] or calibration weighting [14]. Sample estimates are then calculated using non-probability sample data with these pseudo-weights.

In contrast, the model-based framework uses the nonprobability sample to fit a prediction model for the population. The predicted model is then used for estimation and inference on the population parameters [15,16].

Summary

Unlikely the probability sampling framework, a single framework that encompasses the non-probability sampling has not been established yet. So non-probability sampling framework is still under controversy. Nevertheless, if the major form of sample surveys would be transferred from survey with probability samples to surveys with non-probability samples in this century, then, similarly to the previous century’s sample survey, it is likely to be due to the soaring demand for non-probability sample surveys. Based on this trend of development, more theories related to non-probability sampling will be developed and supplemented. More useful research on non-probability sampling methodology is expected.

References

Sign up for Newsletter

Sign up for our newsletter to receive the latest updates. We respect your privacy and will never share your email address with anyone else.